Module: Regression and Logarithmic Relationships

By the end of this module, you should be able to:

Logarithmic Axes

Previously, you learned how to do regression analysis, which is the process of answering the question, “What is the relationship between variables X and Y?”. This process consists of graphing standard curves, finding their equations, and checking R2 values.

So far, we have only explored variables with linear relationships; however, relationships between variables are not always linear.

In this module we will use a fictitious data frame on the Asian Carp population in Lake Superior. Consider the following story. Several years ago, a plane traveling towards the Minneapolis/St. Paul International Airport crashed into Lake Superior. The plane was carrying several different species for research purposes at the University of Minnesota. There were 11 Asian Carp on board the plane, and all survived the crash. Scientists recorded the Asian Carp population in Lake Superior in each month following the crash (for more information on the Asian Carp's rapid population growth in non-native areas, see http://en.wikipedia.org/wiki/Asian_carp).

The Asian Carp data frame contains two variables and fifty cases.

First, load the mosaic and ggplot2 packages

library(mosaic)
library(ggplot2)

Now, load the data frame

fishdata = fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0AnFamthOzwySdGJfb0ZpcEt1SHlsSHVrU19xSm9tc0E&output=csv")
head(fishdata)
##   month population
## 1     1         11
## 2     2         11
## 3     3         11
## 4     4         13
## 5     5         15
## 6     6         13

Let's look at a scatterplot of population by month:

ggplot(fishdata, aes(month, population)) + geom_point() + labs(title = "Population by Month", x = "Month", y = "Population") + stat_smooth(method = "lm", se = FALSE)

plot of chunk unnamed-chunk-4

Notice that the standard curve plotted on the graph does not accurately model the variables' relationship, meaning there is not a linear relationship between month and population. Rather, we observe that population increases at an exponential rate. In this case, we must re-scale one axis in order for the standard curve to reflect the variables' relationship.

How do we know what the scale should be? Again, look at the graph. It shows that as the months progress, the population rises; however, the rate of increase seems to increase over time. This means the relationship between the two variables is logarithmic (if you don't remember what a logarithm is, search 'logarithm' in Google). Thus, if you convert the y-axis (population) to \( \text{log}_{10} \) scale, you will observe a linear relationship between the variables. The R code to do this is below.

ggplot(fishdata, aes(month, log10(population))) + geom_point() + labs(title = "Scatterplot of Log10(Population) by Month", x="Month", y="Log10(Population)") + stat_smooth(method = "lm", se=FALSE)

plot of chunk unnamed-chunk-5

It appears the relationship between log10(population) and month is linear. Therefore, we can now build a linear model to find the equation of the standard curve between log10(population) and month. We will call our model 'fishmod1', and then use the regressionAnalysis() function to summarize the model.

fishmod1 = lm(log10(population) ~ month, data = fishdata)
regressionAnalysis(fishmod1)
## $Model.Values
##       terms coefficients p.values
## 1 Intercept       0.9997        0
## 2     month       0.0250        0
## 
## $R.Squared
## [1] 0.984
## 
## $Equation
## [1] "The equation of the linear model is: log10(population) = 0.9997 + 0.025 * month"

According to the regressionAnalysis() output, the model equation is:

log10(population) = 1 + .02month

The inversePredict() function (I DONT THINK THIS WORKS ANYMORE)

We can also use the inversePredict() function to predict the value of the explanatory variable based on the value of the response variable (which is why it is an inverse prediction). The function's syntax is:

For example, if we want to determine which week the log10(population) was 2, we find the answer to be week 40 using the inversePredict() function.

inversePredict(fishmod1, 2)
## $inversePrediction
## [1] 40.06
## 
## $Graph
## Error: object 'mod' not found